Towards Fault-tolerant HLA-based Distributed Simulations

نویسندگان

  • Dan Chen
  • Stephen John Turner
  • Wentong Cai
چکیده

Large scale High Level Architecture (HLA)-based simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation fed-erates running at different locations are subject to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault tolerance is required to support runtime robustness. This paper introduces a framework for robust HLA-based distributed simulations using a 'Decoupled Federate Architec-ture'. The framework provides a generic fault-tolerant model, which deals with failure with a dynamic substitution approach. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform effective fossil collection. The fault-tolerant model also avoids any unnecessary repeated computation when handling failure. Using a middleware approach, the framework supports reusability of legacy federate code and it is platform-neutral and independent of federate modeling approaches. Experiments have been carried out to validate and benchmark the fault-tolerant federates using an example of a supply-chain simulation. The experimental results show that the framework provides correct failure recovery. The results also indicate that the framework only incurs minimal overhead for facilitating fault tolerance and has a promising scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2005: A Framework for Fault-Tolerance in HLA-Based Distributed Simulations

The widespread use of simulation in future military systems depends, among others, on the degree of reuse and availability of simulation models. Simulation support in such systems must also cope with failure in software or hardware. Research in fault-tolerant distributed simulation, especially in the context of the High Level Architecture (HLA), has been quite sparse. Nor does the HLA standard ...

متن کامل

Challenges in Model Checking of Fault-tolerant Designs in TLA

Although, historically, fault tolerance is connected to safetycritical systems, there has been an increasing interest in fault tolerance in mainstream application such as the cloud. There is a need for formal specification and verification of industrial fault-tolerant designs, since they integrate, in a non-trivial way, the ideas from distributed algorithms, whose correctness is usually based o...

متن کامل

Novel efficient fault-tolerant full-adder for quantum-dot cellular automata

Quantum-dot cellular automata (QCA) are an emerging technology and a possible alternative for semiconductor transistor based technologies. A novel fault-tolerant QCA full-adder cell is proposed: This component is simple in structure and suitable for designing fault-tolerant QCA circuits. The redundant version of QCA full-adder cell is powerful in terms of implementing robust digital functions. ...

متن کامل

Novel efficient fault-tolerant full-adder for quantum-dot cellular automata

Quantum-dot cellular automata (QCA) are an emerging technology and a possible alternative for semiconductor transistor based technologies. A novel fault-tolerant QCA full-adder cell is proposed: This component is simple in structure and suitable for designing fault-tolerant QCA circuits. The redundant version of QCA full-adder cell is powerful in terms of implementing robust digital functions. ...

متن کامل

Using Peer Support to Reduce Fault-Tolerant Overhead in Distributed Shared Memories

We present a peer logging system for reducing performance overhead in fault-tolerant distributed shared memory systems. Our system provides fault-tolerant shared memory using individual checkpointing and rollback. Peer logging logs DSM modification messages to remote nodes instead of to local disks. We present results for implementations of our fault-tolerant technique using simulations of both...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Simulation

دوره 84  شماره 

صفحات  -

تاریخ انتشار 2008